Dynamic Cache Partitioning Based on the MLP of Cache Misses
نویسندگان
چکیده
Dynamic partitioning of shared caches has been proposed to improve performance of traditional eviction policies in modern multithreaded architectures. All existing Dynamic Cache Partitioning (DCP) algorithms work on the number of misses caused by each thread and treat all misses equally. However, it has been shown that cache misses cause different impact in performance depending on their distribution. Clustered misses share their miss penalty as they can be served in parallel, while isolated misses have a greater impact on performance as the memory latency is not shared with other misses. We take this fact into account and propose a new DCP algorithm that considers misses differently depending on their influence in performance. Our proposal obtains improvements over traditional eviction policies up to 63.9% (10.6% on average) and it also outperforms previous DCP proposals by up to 15.4% (4.1% on average) in a four-core architecture. Our proposal reaches the same performance as a 50% larger shared cache. Finally, we present a practical implementation of our proposal that requires less than 8KB of storage.
منابع مشابه
Enhanching MLP: Runahead Execution and Related Techniques
The growing memory wall1 makes speedups increasingly difficult to achieve on applications that exhibit difficult-topredict memory access patterns. The problem is that although modern processors provide multiple high-bandwidth execution units, applications that experience frequent cache misses are only executed with high IPC in the periods between misses. As main memory latencies increase from 2...
متن کاملPerformance Modeling of Memory Latency Hiding Techniques
Due to the ever-increasing computational power of contemporary microprocessors, the execution time spent on actual arithmetic computations (i.e., computations not involving slow memory operations such as cache misses) is significantly reduced. Therefore, for memory intensive workloads, it is more important to overlap multiple cache misses than to overlap slow memory operations with other comput...
متن کاملMLP yes ! ILP no !
Problem Description: It should be well known that processors are outstripping memory performance: specifically that memory latencies are not improving as fast as processor cycle time or IPC or memory bandwidth. Thought experiment: imagine that a cache miss takes 10000 cycles to execute. For such a processor instruction level parallelism is useless, because most of the time is spent waiting for ...
متن کاملReduction in Cache Memory Power Consumption based on Replacement Quantity
Today power consumption is considered to be one of the important issues. Therefore, its reduction plays a considerable role in developing systems. Previous studies have shown that approximately 50% of total power consumption is used in cache memories. There is a direct relationship between power consumption and replacement quantity made in cache. The less the number of replacements is, the less...
متن کاملReduction in Cache Memory Power Consumption based on Replacement Quantity
Today power consumption is considered to be one of the important issues. Therefore, its reduction plays a considerable role in developing systems. Previous studies have shown that approximately 50% of total power consumption is used in cache memories. There is a direct relationship between power consumption and replacement quantity made in cache. The less the number of replacements is, the less...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Trans. HiPEAC
دوره 3 شماره
صفحات -
تاریخ انتشار 2011